Skip to content

Path workers, allow node operators to configure threads used#6667

Open
shortthefomo wants to merge 13 commits intoXRPLF:developfrom
shortthefomo:path-workers-2
Open

Path workers, allow node operators to configure threads used#6667
shortthefomo wants to merge 13 commits intoXRPLF:developfrom
shortthefomo:path-workers-2

Conversation

@shortthefomo
Copy link
Copy Markdown

High Level Overview of Change

Introduce a new configurable limit for pathfinding workers ([path_workers]), replacing the previous jt_update_pf_limit. This allows administrators to control the maximum number of concurrently running jtUPDATE_PF jobs in the JobQueue, which impacts path update and full order book update throughput. The limit is now capped at 3/4 of the configured [workers] (rounded down), with a minimum cap of 2, to prevent system overload while allowing better utilization of available threads.

Additionally, tie the number of pathfinding threads (mPathFindThread) dynamically to the configured workers, ensuring scalability.

This change does heavily impact a node CPU use when configured away from the default.

Context of Change

The pathfinding thread count is now tied to workers to avoid fixed low limits that don't scale. This improves performance on larger deployments while preventing resource exhaustion.

The core finding is that because pathfinding and order books updates are blocked behind a single thread, all other requests on todo that get stuck behind the slowest request. As xrpld stands today.

It is recommended that pathfinding nodes do run in memory mode along with this patch #6549

Every effort has been made to make sure the node is not starved when servicing large number of requests. Even so a validator should not be configuring their node to discover paths. This config is meant only for node setup to discover paths.

API Impact

  • Public API: New feature (new methods and/or new fields)
  • Public API: Breaking change (in general, breaking changes should only impact the next api_version)
  • libxrpl change (any change that may affect libxrpl or dependents of libxrpl)
  • Peer protocol change (must be backward compatible or bump the peer protocol version)

No API impact - this is purely a configuration change.

Before / After

Before:

  • jt_update_pf_limit was limited to 1
  • Pathfinding threads: Fixed at 1 and Job Queue of Fixed minimum of 2, not tied to workers
  • Limiting the pathfinding operation effetely to 1 thread where all requests stack up if a longer pathfinder operation is added to the node effectively everyone using the node has to wait for it.

After:

  • Config option: [path_workers] (default 2, max 3/4 of workers rounded down, min cap 2)
  • Limit cap: 3/4 of workers (min 2), enforced at runtime
  • Allows multiple threads to service the requests, where by if a longer running path find operation on the node is requested not everyone is stuck waiting for that to complete.

Example: With [workers] = 14, max [path_workers] is now 10 (vs. previous 1).

Test Plan

  • Build verification: Code compiles successfully with CMake.
  • Config validation: Invalid values (e.g., > 3/4 workers) are rejected with clear error messages.
  • Runtime behavior: Server starts with valid configs and enforces limits dynamically.

To test: Set [path_workers] to a value > 3/4 of [workers] and verify startup failure with appropriate error.

Future Tasks

None - this completes the pathfinding worker configurability improvements.

*mixed up branches was working on and totally messed up #6604 so this PR is fixing that.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds configurability for pathfinding-related concurrency by introducing a new [path_workers] config and wiring it through the JobQueue’s jtUPDATE_PF limit, so operators can scale path update / order book update throughput with available worker threads.

Changes:

  • Add [path_workers] config (default 2) with validation capped to max(2, floor(3/4 * effective workers)).
  • Pass configured path worker limit into JobQueue and enforce it via a per-job-type limit for jtUPDATE_PF.
  • Use the configured limit to bound LedgerMaster’s concurrent pathfinding work dispatch.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/xrpld/core/detail/Config.cpp Parse/validate [path_workers] against effective worker count
src/xrpld/core/ConfigSections.h Add SECTION_PATH_WORKERS constant
src/xrpld/core/Config.h Add PATH_WORKERS member defaulting to 2
src/xrpld/app/main/Application.cpp Compute effective worker threads and pass PATH_WORKERS into JobQueue ctor
src/xrpld/app/ledger/detail/LedgerMaster.cpp Use PATH_WORKERS/JobQueue limit to cap concurrent jtUPDATE_PF dispatch
src/libxrpl/core/detail/JobQueue.cpp Add ctor arg + enforce jtUPDATE_PF limit in getJobLimit()
include/xrpl/core/JobQueue.h Expose new ctor parameter and getUpdatePathsJobLimit() accessor
cfg/xrpld-example.cfg Document new [path_workers] section
Comments suppressed due to low confidence (1)

include/xrpl/core/JobQueue.h:131

  • JobQueue is defined in a public header (include/xrpl/core/JobQueue.h) and this PR changes its constructor signature by adding updatePathsJobLimit. That is an externally visible API/ABI change (and a libxrpl change), so the PR description/checklist claiming “No API impact” looks inaccurate. Consider updating the PR metadata and/or preserving backward compatibility (e.g., overload or default parameter) if downstream code may construct JobQueue.
    JobQueue(
        int threadCount,
        int updatePathsJobLimit,
        beast::insight::Collector::ptr const& collector,
        beast::Journal journal,
        Logs& logs,
        perf::PerfLog& perfLog);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +823 to 825
# Maximum value is 3/4 of [workers] (with a minimum of 2).
#
#
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example config says the maximum is “3/4 of [workers]”, but the code enforces the limit against the effective job-queue worker count (which is auto-derived when [workers] isn’t explicitly set). To avoid misleading operators, consider documenting that the cap is based on the effective worker count (explicit [workers] or the auto-selected default).

Suggested change
# Maximum value is 3/4 of [workers] (with a minimum of 2).
#
#
# Maximum value is 3/4 of the effective JobQueue worker count (explicit
# [workers] or the auto-selected default), with a minimum of 2.
#

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this is more confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants